Introduction

In this project, we explored the world of instant noodles, aka ramen. Our data set is from the Ramen Rater website created by a single ramen enthusiast, with over 2500 reviews on all kinds of instant noodles one can possibly find in stores (“THE BIG LIST” 2021).

The main problem we try to solve is to find what features are important for predicting whether a given ramen is good or not. We used the OneHotEncoder(), CountVector() to transform the data and a CatBoost model.

This is not a big question, but it is a good start of figuring out a result in real-life problems with data science for us. Considering the usefulness of this model for food lovers around the world when choosing nearby ramen restaurants, we think this is a very interesting and meaningful question.

The Dataset

Each observation in the data set is a review for a single ramen product. The features include a review number, where bigger number represents a more recent review, the brand, the product’s name, its manufacturing country, packing style (such as cup or bowl), and star ratings, which range from 0 to 5 inclusively with 0.25 increments. Note that the stars represent the reviewer’s personal taste and are a very subjective scoring.

Exploratory Data Analysis

To understand the data better, we explore to visualize the distribution of the country of origins of all products. It seems that most products come from China, South Korea, Japan, and the USA.

Figure 1. Origins of Ramen Products

Figure 1. Origins of Ramen Products

There are many variety and the below word cloud displays the most common keywords in ramen descriptions. Wow, these noodles are created with so many flavors! They also come in with different packaging. A half of the sample come in as a pack. But some are sold in a bowl or tray, which are more convenient for direct usage.

Figure 2. Word Cloud of Ramen Variety and Package Style HistogramFigure 2. Word Cloud of Ramen Variety and Package Style Histogram

Figure 2. Word Cloud of Ramen Variety and Package Style Histogram

Let’s see how the ratings distribute. It look like most ramens are quite tasty! But there are a few that received a zero star.

Figure 3. Histogram of Ratings

Figure 3. Histogram of Ratings

Methods

For the preprocessing, we apply One Hot Encoding to transform brand, country, and style and use bag-of-word to process variety feature. We drop top ten and review #. We also convert the target Star rating to a binary variable at a threshold of 3.5, with 0 (Star < 3.5) being bad ramen and 1 (Star >= 3.5) for good ramen. This threshold is set by the original reviewer himself.

While doing model selection, we tried 4 types of model, which are CatBoost, Logistic Regression, Random Forest, and SVM. While doing feature selection, we used one wrapper algorithms (boruta algorithm) and recursive feature elimination. We finally chose CatBoost and Boruta Algorithm selected features as our final model’s setup. As the plot below, this model has the high valid accuracy (0.758) and a small accuracy gap (0.046) between valid dataset and train dataset with only 77 features.

Figure 4. Test accuracy and Train/Test accuracy gap of different combinations

Figure 4. Test accuracy and Train/Test accuracy gap of different combinations

Five-fold cross validation and random search are used to optimize the model. After searching hyperparameters, we use {‘learning_rate’: 0.078, ‘max_depth’: 5, ‘n_estimators’: 600} as our final parameters. Since the class distribution for the two classes is 0.7 vs. 0.3, so we decide to train the model with parameter class_weights equals to balanced.

Results

On the test data, the CatBoost model gives a precision score of 0.760, a recall score of 0.954, and a F1 score of 0.847. It is good enough for a simple model like ours.

As below, these two tables shows us the valid and test’s performance.

Table 1. Validation and Train Performance
valid_accuracy train_accuracy valid_f1 train_f1 valid_recall train_recall valid_precision train_precision
0.7610759 0.8310918 0.8587465 0.8982122 0.9724576 0.9978814 0.7688442 0.8166450
0.7436709 0.8303006 0.8488806 0.8978815 0.9639831 0.9989407 0.7583333 0.8153913
0.7531646 0.8295095 0.8536585 0.8974542 0.9639831 0.9989407 0.7659933 0.8146868
0.7689873 0.8168513 0.8625235 0.8907761 0.9703390 1.0000000 0.7762712 0.8030625
0.7753165 0.8204114 0.8665414 0.8925189 0.9766949 0.9984110 0.7787162 0.8069349
Table 2. Test and Train Performance
test_accuracy train_accuracy test_f1 train_f1 test_recall train_recall test_precision train_precision
0.7506329 0.8443038 0.8466926 0.9043546 0.954386 0.9855932 0.7608392 0.8354885

As below, the plot shows us the confusion matrix of the CatBoost.

Interpetation

Shapley value are used here to explain the CatBoost as below. We can see that there are more good ramens associated features than those of bad ramens. It makes sense because the dataset tends to score positively to ramens. Good ramen noodles are usually associated with features like being brand Samyang Foods or Nissin, having description keyword “goreng” (which refers to fried food in Southeast Asian cuisine (Wikipedia contributors 2021)), and are made in Japan or Indonesia. On the other hand, bad ramen noodles are associated with features like being cup noodle or made in United States. Now whenever you are craving for quick, simple, and tasty ramen noodles, remember to come back for this plot!

Critique

First of all, the amount of data used to build the model is relatively small, which may have a certain impact on the model performance. Secondly, the feature Top ten was not used in the analysis process. In the future, we hope to make reasonable use of this indicator after learning more data processing methods. Lastly, we recognize that the data set contains reviews done by a single person, which makes our prediction model very subjective and not generalizable for the general audience. One shall proceed with caution when using this result as a shopping guide.

References

Xie (2014)

de Jonge (2018)

de Jonge, Edwin. 2018. Docopt: Command-Line Interface Specification Language. https://CRAN.R-project.org/package=docopt.
“THE BIG LIST.” 2021. THE RAMEN RATER. https://www.theramenrater.com/resources-2/the-list/.
Wikipedia contributors. 2021. “Goreng — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=Goreng&oldid=1012204146.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.